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We show how to extend a recently proposed muhi-level Monte Carlo approach to the continuous 
time Markov chain setting, thereby greatly lowering the computational complexity needed to compute 
expected values of functions of the state of the system to a specified accuracy. The extension is non- 
trivial, exploiting a coupling of the requisite processes that is easy to simulate while providing a small 
variance for the estimator. Further, and in a stark departure from other implementations of multi-level 
,— — I Monte Carlo, we show how to produce an unbiased estimator that is significantly less computationally ex- 

p/ pensive than the usual unbiased estimator arising from exact algorithms in conjunction with crude Monte 

P^ Carlo. We thereby dramatically improve, in a quantifiable manner, the basic computational complexity 

of current approaches that have many names and variants across the scientific literature, including the 
Bortz-Kalos-Lebowitz algorithm, discrete event simulation, dynamic Monte Carlo, kinetic Monte Carlo, 
the n-fold way, the next reaction method, the residence-time algorithm, the stochastic simulation algo- 
Ch rithm, Gillespie's algorithm, and tau-leaping. The new algorithm applies generically, but we also give an 

example where the coupling idea alone, even without a multi-level discretization, can be used to improve 
fT^ efficiency by exploiting system structure. Stochastically modeled chemical reaction networks provide 

J> a very important application for this work. Hence, we use this context for our notation, terminology, 

natural scalings, and computational examples. 
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1 Introduction 



. ^H This paper concerns the efficient computation of expectations for continuous time Markov chains. Specifi- 

/S cally, we extend the multi-level Monte Carlo approach of Giles ifTSl . with related earlier work by Heinrich 

^ 1241, to this setting. We study the wide class of systems that can be written using the random time change 



representation of Kurtz [il5i Chapter 6] 11321 in the form 

R / rt 



X{t) = X(0) + Y."^k{j ^k{X{s))d.^ Ck, (1) 

where the Y^ are independent unit-rate Poisson processes, Cfc G ^'^, and the functions A^ are the associated 
intensity, or propensity, functions. While such models are used in nearly all branches of the sciences, 
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especially in the studies of queues and populations, their use has recently exploded in the biosciences, and 
we use this application area for the setting of our work. We will formally introduce these models in Section 
[2j however we begin by demonstrating how two different models, one from chemistry and one from queuing, 
can be represented via ([T]). 

First, consider a linear reversible chemical network 

Si ^ 5*2, 

in which molecules of type 5i convert to molecules of type S2 at rate kiXi, where Xi is the number of 5*1 
molecules, and molecules of type S2 convert to 5i at rate K2X2. Here we are assuming the system satisfies 
mass action kinetics, see Section [2] The usual stochastic model, written in the framework of ([TJ, is then 

X{t) = X{0) + Yi (j KiXi{s)d.^ ( ~^ ) +>2 f /" t^2X2{s)d.^ ( -1 ) • 

Next, consider an M/M/k queue in which arrivals are happening at a constant rate A > 0, and there are 
k servers, with each serving at a rate /i > 0. Letting X{t) denote the number of customers in the queue at 
time t, 

X{t) = X(0) + Yi (At) -Y2U f {X{s) A A;) ds) 

where we define a Ab = min{a, b}. 

There are multiple algorithms available to compute exact sample paths of continuous time Markov 
chains, and, though they are all only slight variants of each other, they go by different names depending upon 
the branch of science within which they are being applied. These include the Bortz-Kalos-Lebowitz algo- 
rithm, discrete event simulation, dynamic Monte Carlo, kinetic Monte Carlo, the n-fold way, the residence- 
time algorithm, the stochastic simulation algorithm, the next reaction method, and Gillespie's algorithm, 
where the final two are the most commonly referred to algorithms in the biosciences. As the computational 
cost of exact algorithms scales linearly with the number of jump events (i.e. reactions), such methods be- 
come computationally intense for even moderately sized systems. This issue looms large when many sample 
paths are needed in a Monte Carlo setting. To address this, approximate algorithms, and notably the class 
of algorithms termed "tau-leaping" methods introduced by Gillespie f2T| in the chemical kinetic setting, 
have been developed with the explicit aim of greatly lowering the computational complexity of each path 
simulation while controlling the bias Il3ll5ll6l l27ll34ll35]| . 

A common task in the study of stochastic models, and the main focus of this paper, is to approximate 
¥,f{X{T)), where / is a scalar- valued function of the state of the system which gives a measurement of 
interest. For example, the function / could be: 

1. f{X{T)) = Xi{T), yielding estimates for mean values, or 

2. f{X{T)) = Xi{T)Xj{t), which can be used with estimates for the mean values to provide estimates 
of variances (when i = j) and covariances (when i 7^ j), or 

3. f{X{T)) = l{x{T)eB}y the indicator function giving 1 if the state is in some specified set. Such 
functions could also be used to construct histograms, for example, since ¥,f{X{T)) = P{X{T) G 
B}. 

Suppose we use an exact simulation algorithm to approximate E/(X(T)) to 0(e) accuracy in the sense 
of confidence intervals. To do so, we need to generate n = 0(e~^) paths so that the standard deviation of 
the usual Monte Carlo estimator, 

1 " 



where Xy^ are independent realizations generated via an exact algorithm, is 0(e). If we let A^ > be the 
order of magnitude of the number of computations needed to produce a single sample path using an exact 
algorithm, then the total computational complexity becomes 0{Ne~''^). (Here, and throughout, we work in 
terms of expected computational complexity.) 

When N ^ I, which is the norm as opposed to the exception in many settings, it may be desirable 
to make use of an approximate algorithm. Suppose ¥,f{X{T)) — E/(Z/j(r)) = 0{h), where Zfi is an 
approximate path generated from a time discretization with a magnitude of h (i.e. we have a weakly order 
one method). We first make the trivial observation that the estimator 

1 " 

t'n = -T.f(^h,[J]iT)), (2) 

where ^h,y] are independent paths generated via the approximate algorithm with a step size of h, is an 
unbiased estimator ofEf{Zh{T)), and not Ef{X{T)). However, noting that 

Ef{X{T)) -nn= [^f{X{T)) - Ef{Zh{T))] + [E/(Z^(r)) - ^„] , (3) 

we see that choosing h = 0(e), so that the first term on the right is 0(e), and n = 0(e~^), so that the 
standard deviation is 0(e), delivers the desired accuracy. With a fixed cost per time step, the computational 
complexity of generating a single such path is 0{e~^) and we find that the total computational complexity 
is 0{e^^). Second order methods lower the computational complexity to 0(e^^'^), as h may be chosen to 
beO(ei/2). 

The discussion above suggests that the choice between exact or approximate path computation should 
depend upon whether e^^ or A^ is the larger value, with an exact algorithm being beneficial when N < e^^. 
It is again worth noting, however, that the estimators built from approximate methods are biased, and while 
analytic bounds can be provided for that bias ||5j |6l |34l these are typically neither sharp nor computable, 
and hence of limited practical value. The exact algorithm, on the other hand, trivially produces an unbiased 
estimator, so it may be argued that e^^ ^ A^ is necessary before it is worthwhile to switch to an approximate 
method. 

In the diffusive setting the multi-level Monte Carlo approach has the remarkable property of lowering 
the standard 0(e~^) cost of computing an 0(e) accurate Monte Carlo estimate of Kf{X(T)) down to 
0(e^^ log(e)^) ifTSl . Here, we are assuming that a weak order one and strong order 1/2 discretization 
method, such as Euler-Maruyama, is used. Further refinements have appeared in ||T9ll20ll25ll28ll30ll . and 
the same ideas have been applied to partial differential equations |9nl3j|. A key motivation for multi-level 
Monte Carlo is that optimizing the overall expected value computation is a different, and typically more 
relevant, goal than optimizing along each path. Computing an expectation using only an exact algorithm 
(or an algorithm with a very fine time-step) can require a large number of paths and an extremely large 
number of random variables and state updates. In general, the total number of paths cannot be reduced. The 
computational benefits of multi-level Monte Carlo arise because the number of random variables and state 
updates needed to approximate the expectation can be drastically reduced by averaging over a very carefully 
chosen combination of coordinated realizations, many of which are much cheaper to compute than an exact 
realization. 

In this paper we extend the multi-level approach to the continuous time Markov chain setting, and espe- 
cially the stochastic chemical kinetic setting. The extension involves a non-trivial coupling of the requisite 
processes that is easy to simulate while providing a very small variance for the estimator. In fact, showing 



the practical importance of the coupling (found in this paper in both equations ( [T8| ) and (p2|)), which was 
first used in [33] and later in [5] as an analytical tool and subsequently in [1] towards the problem of com- 
puting parameter sensitivities, could be viewed as the most important contribution of this paper. Further, 



and in a stark departure from other implementations of multi-level Monte Carlo, we provide a second multi- 
level Monte Carlo algorithm which exploits the representation ([T]l to produce an unbiased estimator giving 
the desired accuracy with significantly less computational complexity than an exact algorithm alone. The 
authors believe that this unbiased multi-level Monte Carlo will become a standard, generic algorithm for 
approximating expected values of continuous time Markov chains, and especially stochastically modeled 
chemical reaction networks. 

We emphasize that the gains in computational efficiency reported in this work apply to generic models, 
and do not rely on any specific structural properties. However, the ideas have the potential to be fine-tuned 
further in appropriate cases; for example by exploiting known analytical results or multi-scale partitions. 
We provide such an example in Section [9] We also emphasize that our complexity analysis does not involve 
asymptotic limits. In particular, we do not consider infinitely large system size, where stochastic effects 
vanish, or infinitesimally small discretization time-step, where the benefits of an approximate method evap- 
orate. 

The outline of the remainder of the paper is as follows. In Section [2j we consider stochastically mod- 
eled chemical reaction networks, which is our main application area, discussing how such models can be 
represented via ([T]l. In Section [3j we introduce an equivalent model to ([T]l that incorporates the natural tem- 
poral and other quantitative scales. Consideration of such a scaled model is critical for realistic quantitative 
comparisons of accuracy versus cost for computational methods, though it plays no role in the actual sim- 
ulations. In Section [4j we briefly review Euler's method, often called tau-leaping in the chemical kinetic 
setting. In Section [5] we review the original multi-level Monte Carlo method. In Section [6} we extend 
multi-level Monte Carlo to the continuous time Markov chain setting in two different ways. In the first, 
exact algorithms are not used and we are led to an efficient method with an unquantified bias. In the second, 
exact algorithms play a key role and allow us to develop unbiased estimators. In both cases, we quantify 
precisely the generic computational efficiencies obtained, relative to standard Monte Carlo. In Section|7} we 
provide the delayed proofs of the main analytical results of Section |6] In Section [8} we briefly discuss some 
implementation issues. In Section |9] we provide computational examples demonstrating our main results. 



Finally, in Section 10 we provide some brief conclusions. 



2 The basic stochastic model for chemical reaction networks 

In this section we discuss how the basic stochastic model for chemical reaction networks can be represented 
via ([T]) for suitable choices of A^ and (k. A chemical reaction network consists of the interaction of multiple 
species, {Si,. . . , Sd}, through different possible reactions. If we denote by (k £ '^'^ the change to the state 
of the system after each occurrence of the kth reaction, then we have 

x{t) = x{o) + Y,Rk(.tKk, 

k 

where Xj (t) gives the number of molecules of Si at time t, and R^ (t) is the number of times the kth reaction 
has taken place up until time t. To model R^, each reaction channel is assumed to have an associated 
intensity, or propensity, function, A^ : M'^ — )• M>o, and for the standard Markov chain model, the number of 
times that the kth reaction occurs by time t can then be represented by the counting process 

Rk{t)=Yjf Xk{X{s))ds 

where the Yk are independent unit-rate Poisson processes; see, for example, ll32ll . |[T5l Chapter 6], or the 
recent survey (71. The state of the system then satisfies ([T]). This formulation is termed a "random time 



change representation" and is equivalent to the "chemical master equation representation" found in much of 
the biology and chemistry literature. 

A common choice of intensity function for chemical reaction systems, and the one we adopt throughout, 
is that of mass action kinetics. Under mass action kinetics, the intensity function for the kth reaction is 



h{x) = Kfc JJ 



\ {Xi -i^kiV- 



I ^{Xi>'^ki}' 



(4) 



where u^i denotes the number of molecules of Si required for one instance of the reaction. Note that 
^k{x) = whenever x, < and uj^i / 0. We note that none of the core ideas of this paper depend upon the 
fact that Afc are mass-action kinetics and the assumption is made for analytical convenience and historical 
consistency. 

This model is a continuous time Markov chain in Z'^ with generator 



{Af){x) = Y^ \k{x)U{x + a) - fix)), 



where / : Z'^ — ;■ M. Kolmogorov's forward equation, termed the chemical master equation in much of the 
biology literature, for this model is 



d_ 
dt 



P{x, t\TT) = Y^ \k{x - Cfc)l|^_^^ezd jP(x - Cfc, t|vr) - ^ Afc(x)P(x, i|7r), 



where for x S Z>q, P{x, t|7r) represents the probability that X{t) = x, conditioned upon the initial distri- 
bution vr. 

Example 1 

To solidify notation, we consider the network 






Si T± 52, 252 ^ 53, 



where we have placed the rate constants k^ above or below their respective reactions. For this example, 
equation ([T]) is 

X{t) = X(0) + ^1 ( / HlXi{s)ds\ 1 + ^2 ( /" K2X2{s)ds\ 

+ y^ij K^X2{s){X2{s)-l)d^ 

Using Ci = [—1,1, 0]^, C2 = [1, —1, 0]"^, and C3 = [0, —2, 1]-^, the generator A satisfies 

{Af){x) = KiXlUix + Cl) - fix)) + K2X2{f{x + C2) " /(x)) + K3X2(x2 " l)(/(x + (s) " f{x)). 



3 Scaled models 

To quantify the relative computational complexity of different methods, it is important that the natural scal- 
ings of a model be taken into account. However, we stress that such a change to the representation of the 



model does not change the simulation — we simulate the unsealed model but analyze the methods on an 
appropriately scaled version. 

Letting N be some natural parameter of the system, which is usually taken to be the abundance of the 
most abundant component, we scale the model by setting X^ = N^"'^Xi, where aj > is chosen so that 
Xj^ = 0(1). The general form of such a scaled model is 



X'^ (t) = X 



N 



k 



N^ / N'>'Xk{X'\s))ds Ck 



'■N 



(5) 



where 7 and c^ are scalars, |Cfc | = 0{N^'^'^), and both X and \k{X ) are of order one. Note that we are 
explicitly allowing for \C^\ to be smaller than A^~^*=, a point made explicit in and around equation Q. We 
note that we should write A^, as the resulting intensity function may depend upon N, though we drop the 
superscript N for notational convenience. It is now natural to take 



A^ 



iV7 Y^ ATCfc 



as the order of magnitude for the number of computations required to generate a single path using an exact 
algorithm. We will demonstrate how to arrive at such a scaled model for chemical systems below, however 
we first discuss the parameter 7. 

The parameter 7 should be interpreted as being related to the natural time-scale of the model. That is, if 
7 > then the shortest timescale in the problem is much smaller than 1 , while if 7 < it is much larger. 
The analysis in this paper is most applicable in the case that 7 < 0, for otherwise the error bounds grow 
quite rapidly. However, and as will be demonstrated in the examples section, the methods developed can 
still behave very well even when 7 > 0, pointing out that the present analysis does not fully capture the 
behavior of the methods. 

We will show how to derive a model of the form Q in the case of chemical reaction networks with mass 
action kinetics. Let N ^ 1, where N is the abundance of the most abundant species, or some other large 
parameter. Suppose we have a model of the form 



X{t)=X{Q) + Y,Ykl^f^K 



'k{X{s))ds Ck, 



where the A'^ are of the form 



Afc(a;) 



n 



l^ki 



For each species, define the normalized abundance by X^ {t) = N~'^^Xi{t), where aj > should be 
selected so that Xf = 0(1). Here Xf may be the species number (a, = 0), the species concentration, 
or something else. Since the rate constants may also vary over several orders of magnitude, we write 
k'j. = KkN^'' where the /3fc are selected so that k^ = 0(1). Under the mass-action kinetics assumption, we 
have that A'^(X(s)) = A^^''+'^''"Afc(X^(s)), where A^ is deterministic mass-action kinetics with parameter 
t^k II29I . and we recall that v^ is the source vector of the fcth reaction. Our model has therefore become 



x^^(t) = x^^(o) + > y, 



k 



N^k+i^''-'^Xk{X^{s))ds 



^k ! 



where C^^ = Cfcj/^"' (so (^ is the scaled reaction vector). Define 7 G M via 

7= max {(3k + i^k ■ a - at} . 



Then, for each k define 

Cfc = /5fc + ffc • a - 7. (6) 

With these definitions, our chemical model becomes Q. 

Returning to the general setting of ([5]), for each k we define 

Pk =min{Qi : Cfei / 0}, (7) 

so that \C,k\ ~ N^P'', and define p = m.m{pk}. We have that p > 0, and by the choice of 7 we have 
Ck — Pk ^ for all k. Further, we point out that 7 is chosen so that Cfc = for at least one k. Also, if 
1 1 V/ 1 1 00 is bounded, then 

with Ck — Pk = for at least one k. Finally, it is worth explicitly noting that the classical scaling holds if 
and only if c^ = p^ = 1 and 7 = l5ll3Tl. 

Remark 1. We emphasise that the models ([T]l and ([5]) are equivalent in that X'^ is the scaled version of 
X. The scaling is essentially an analytical tool as now both X'^ and \k{X^ {■)) are 0(1), and in Section 
[7] it will be shown how the representation Q is useful in the quantification of the behavior of different 
computational methods. However, we stress that the scaling itself plays no role in the actual simulation of 
the processes, with the small exception that it can inform the decision for the size of the time step of an 
approximate method. 

Example 2 

To solidify notation, consider the reversible isometry 

100 

5*1 <^ 5*2 
100 

with Xi{Q) = ^2(0) = 10,000. In this case, it is natural to take N = 10,000 and ai = 02 = 1- As the 
rate constants are 100 = y/ 10,000, we take Pi = 1^2 = 1/2 and find that 7 = 1/2 and pi = P2 = 1- The 
normalized process X^ satisfies 

X^{t) = Xf (0) - Yi{n'I^N j\^{s)d.^ 1 + Y2{n''^N j\2 - X^{s))ds^ 1 

where we have used that X^ + X2 = 2. 

Example 3 

We provide a deterministic example tofiirther explain the use of the scalings. Consider the ordinary differ- 
ential equation 

x{t) = XN - px{t), 

where X, p = 0{1), N ^ 1, and xq = 0{N). Of course, the solution to this system is 

, , XN fXN \ „j 

/^ V /^ / 

However, defining x^ = N~^x, we see that x^ satisfies 

x^{t) = X-px^{t), 



with Xq = 0(1). Solving yields 

fJ- \fJ' ) 

Note, then, that solving for either x or x automatically yields the other after scaling. Also note the 
important property that in the ODE governing x, the driving force, XN, was an extremely large value. 
However, the forcing function ofx, which is simply X, was 0(1). 

Example [3] points out an important feature: the functions A^ of ([5]), together with their derivatives, are 
much better behaved, in terms of their magnitude, than the intensity functions of the original model ([T]). 
Therefore, after possibly redefining the kinetics by multiplication with a cutoff function, see, for example, 
imm, it is reasonable to assume that each Xk is, in fact, a globally Lipschitz function of X^ . We formalize 
this assumption here. 

Running assumption: Throughout, we assume that the functions Xk of (|5]) are globally Lipschitz. 



4 A review of Euler's method in the current setting 

We briefly review Euler's method, termed tau-leaping in the chemical kinetic literature EH, as applied to 
the models ([T]l, and equivalently Q. The basic idea of tau-leaping is to hold the intensity functions fixed 
over a time interval [t„,t„ + h] at the values Xk{X{tn)), where X{tn) is the current state of the system, 
and, under this assumption, compute the number of times each reaction takes place over this period. As 
the waiting times for the reactions are exponentially distributed this leads to the following algorithm, which 
simulates up to a time of T > 0. Below and in the sequel, for x > we will write Poisson(a;) to denote 
a sample from the Poisson distribution with parameter x, with all such samples being independent of each 
other and of all other sources of randomness used. 

Algorithm 1 (Euler tau-leaping). Fix h > 0. Set Zh{0) = xq, to = 0, n = and repeat the following until 

tn = T: 

(i) Set tn+i =tn + h. If tn+1 > T, set t„+i = r and /i = T - t„. 
{ii) For each k, let A^ = Poisson(Afc(Z/i(t„))/i). 
{in) Set Zh{tn+i) = Zh{tn) + J2k^kCk- 
{iv) Set n ■^ n + I. 

Several improvements and modifications have been made to the basic algorithm described above over 
the years. Some concern adaptive step-size selection along a path | jmi22l |. Others focus on ensuring non- 
negative population values |l3l[T0l[T2l|37l. The latter issue is easily dealt with in our context; for example, it 
is sufficient to return a value to zero if it ever goes negative in the course of a simulation. This is discussed 
further in subsection 16.21 

Analogously to ([T]l, a path-wise representation of Euler tau-leaping defined for all t > can be given 
through a random time change of Poisson processes: 

Zhit) = ZhiO) + ^Yk(f Xk{Zhor]{s))ds]Ck, (8) 



5 

where the Yk are as before, and 7](s) = — h. Thus, Z^ o r/(s) = ZhUn) Htn < s < tn+i- Noting that 

h. 



rtn+i " 

/ Xk{Zh o r]{s))ds = V" Xk{Zh{ti)){ti+i - t 



explains why this method is called Euler tau-leaping. Following ([5]l, for each i G {1, . . . , d} we let Z^- = 
N^^^Zh^i, so the scaled version of ([8]) is 

Z^it) = Zl^iO) + ^ n (^N^ 1^ N^^\,{Z^ o ^{s))ds^ Cf , (9) 

where all other notation is as before. We again stress that the models (|8) and ([9]) are equivalent, with ([8]l 
usually giving the counts of each component and (|9]) providing the normahzed abundances. 

Remark 2. Historically, the time discretization parameter for the methods described in this paper has been 
T, leading to the name "r-leaping methods." We choose to break from this tradition so as not to confuse r 
with a stopping time, and we denote our time-step by h to be consistent with much of the numerical analysis 
literature. 

5 A review of multi-level Monte Carlo 

Given a stochastic process, X(-), let / : M"^ — )■ M be a function of the state of the system which gives a 
measurement of interest. Our task is to approximate E/(X(T)) efficiently. As discussed in SectionfTl using 
the "crude Monte Carlo" estimator Q with a weakly first order method will provide an estimate with an 
accuracy of 0(e), in the sense of confidence intervals, at a computational cost of 0(e~^). 

In multi-level Monte Carlo (MLMC) paths of varying step-sizes are generated and are coupled in an 
intelligent manner so that the computational complexity is reduced to 0(e~^(log e)^) ifTSl . Sometimes even 
the log(e) terms can be reduced further |17|. Suppose we have an approximate method, such as Euler's 
method in the diffusive setting, which is known to be first order accurate in a weak sense, and 1/2 order 
accurate in a strong L^ sense. The MLMC estimator is then built in the following manner. For a fixed 
integer M, and £ G {0, 1, ... , L}, where L is to be determined, let h^ = TM~^. Reasonable choices for M 
include 2, 3, and 4. We will denote Z^ as the approximate process generated using a step-size of h^. Choose 
L = 0(ln(e-i)), so that hi = 0(e) and E/(X(r)) - E/(ZL(r)) = 0(e), and the bias (i.e. the first term 
on the right hand side of Q) is of the desired order of magnitude. We then have 

L 

EfiZUT)) = E[f{Zo{T))] + ^E[/(Z,(r)) - /(Z,_i(r))], (10) 

e=i 

where the telescoping sum is the key feature to note. We will now denote the estimator of E[/(Zo(T))] 
using no paths by Qo, and the estimator of E[/(Z£(T)) — f{Zg_i{T))] using n£ paths as Qi. That is 



Qo = -Y.f(^m(T))^ and 4;iL'_^(/(z,,[,,](r))-/(z,_i,[,](T))), (11) 



1 

where the important point is that both Z^ [j](r) and Z^_i jj](T) are generated using the same randomness, 
but are constructed using different time discretizations (see ifTSl l26l for details on how to do this in the 
diffusive setting). We then let 

L 

Q = J^Q^, (12) 



be the unbiased estimator for E[/(ZL(r))]. Assuming that we can show \/ar{f{Ze{T)) - f{Zi_i{T))) = 
0{he), which follows if the method has a strong error of order 1/2 and / is Lipschitz, we may set 

Ui = 0{€^'^Lhe), 

which yields Var(Q) = 0{e^), but with a total computational complexity of 0(e^^(loge)^). We make the 
following observations. 

1. The gains in computational efficiency come about for two reasons. First, a coordinated sequence of 
simulations are being done, with nested step-sizes, and the simulations with larger step-size are much 
cheaper than those with very fine step sizes. Second, while we do still require the generation of paths 
with fine step-sizes, the variance of f{Zg) — f{Z£_i) will be small, thereby requiring significantly 
fewer of these expensive paths in the estimation of Qi of ( [TT] ). 

2. For the analysis in |[T8l . it is necessary to know both the weak (for the choice of hi) and strong (for 
the variance of Qi) behavior of the numerical method, even though we are only solving the weak 
approximation problem. 

3. The estimator ( [T2] ) is a biased estimator of E/(X(T)), and the number of levels L was chosen to 
ensure that the bias is within the desired tolerance. 

6 Multi-level Monte Carlo for continuous time Markov chains 

We now consider the problem of estimating Kf(X^{T)), where X^ satisfies the general system Q. We 
again stress that as X^ of Q is equivalent to the process X of Q, efficiently approximating values of the 
form E/(X^(r)), for suitable /, is equivalent to efficiently approximating values of the form 'Eg{X{T)), 
for suitable functions g. The scaled systems are easier to analyze because the temporal and other quantitative 
scales have been made explicit. 

Recall that N = N^ ^^ N^'' gives the order of magnitude of the number of steps needed to generate 
a single path using an exact algorithm. As discussed in Section MJ to approximate E/(X^(r)) to an order 
of accuracy of e > using an exact algorithm (such as Gillespie's algorithm or the next reaction method) 
combined with the crude Monte Carlo estimator, we need to generate e~^ paths. Thus, we have a total 
computational complexity of 0(A^e~^) . 

We will now extend the core ideas of multi-level Monte Carlo as described in Section[5]to the continuous 
time Markov chain setting with Euler tau-leaping, given in Q, as our approximation method. We again fix 
an integer M > 0, and for (. G {^q, ■ ■ ■ ,L}, where both Iq and L are to be determined, let hi = TM~^. We 
then denote by Z^ the approximate process Q generated with a step-size of h^. By [6], for suitable / 

E/(X^(r))-E/(Zf(T)) = 0(/i,). 

Choose L = 0(ln(e^^)), so that h^ = 0(e) and the bias is of the desired order of magnitude. We then 
introduce another telescoping sum 

L 

EfiZnT)) = nf{Z^,iT))]+ Y. E[/(Z,^(r))-/(Zf,(r))]. (13) 

We will again denote the estimator of E[/(Z^(r))] using no paths by Qq, and the estimator of E[/(Z^(T)) — 
f{Z^_^{T))] using Hi paths by Qe. That is 

Qo = -Y.fKd^))^ and Q,'^-Y.^f{ZZ^{T))-f{Zl,^^^{T))), (14) 

«=i 1=1 

10 



where we hope that Z^r.-, and Z^-^ ,■-, can be generated in such a way that Var(Q£) is small. We will then let 

L 

£=eo+i 

be the unbiased estimator for K[f{Z^{T))]. The choices for n£ will depend upon the variances of Q^. 

The main requirements for effectively extending MLMC to the current setting now come into focus. 
First, we must be able to simulate the paths Zf and Z^_^ simultaneously in a manner that is efficient and 
produces small variances between the paths. Second, we must be able to quantify this variance in order to 



control the variance of the associated Q£ terms of ( [T4| ). Both requirements demand a good coupling of the 
processes Zf and Z^-^. 

We motivate our choice of coupling by first treating two simpler tasks. First, consider the problem of 
trying to understand the difference between Zi{t) and Z2{t), where Zi, Z2 are Poisson processes with rates 
13.1 and 13, respectively. A simple approach is to let Y\ and I2 be independent, unit-rate Poisson processes, 
set 

Ziit) = yi(13.1t) and Z2{t) = ^2(13*), 

and consider Zi{t) — Z2{t). Using this representation, these processes are independent and, hence, not 
coupled. Further, the variance of their difference is the sum of their variances, and so 

Var(Zi(t) - Z2{t)) = Var(Zi(t)) + Var(Z2(t)) = 26.lt. 

Another choice is to let Yi and Y2 be independent unit-rate Poisson processes, and set 

Zi{t) = Yi{l2,t) + ^2(0.1*) and Z2{t) = Yi{l2,t), 

where we have used the additivity property of Poisson processes. The important point to note is that both 
Zi and Z2 are using the process yi(13t) to generate simultaneous jumps. The process Zi then uses the 
auxiliary process 1^2(0. It) to jump the extra times that Z2 does not. The processes Zi and Z2 will jump 
together the vast majority of times, and hence are tightly coupled; by construction Var(Zi(t) — Z2{t)) = 
Var(l2(0.1t)) = O.lt. More generally, if Zi and Z2 are instead inhomogeneous Poisson processes with 
intensities f{t) and g{t), respectively, then we could let Yi, Y2, and I3 be independent, unit-rate Poisson 
processes and define 

Zi (t) = Yi (j f{s) A g{s)ds\ + Y2 (j f{s) - {f{s) A g{s)) ds 
Z2{t) = Y^ (j f{s) A g{s)ds\ + Fa (j 9{s) - {f{s) A g{s)) ds 
where we are using that, for example, 

Fi y^ f{s) A g{s)ds^ + Y2 (^ fis) - ifis) A g{s)) ds^ = ^ (^ fi^)ds 

where y is a unit rate Poisson process and we recall that a Ab = min{a, b}. 

We now return to the main problem of coupling the processes Z^ and Z^^, each satisfying Q with 
their respective step-sizes. We couple the processes Zf and Z^-^^ in the following manner, which is similar 
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to a coupling originally used in fST], and later in |T|, as an analytical tool, and subsequently in IH towards 
the problem of computing parameter sensitivities: 

Z^{t) = Zi'iO) + Y, Yk,i (n^N^' f Afc(Zf o r?,(s)) A Afc(Zf 1 o r^,_^{s))ds\ Ck 

+ ^n,2 In^N'^' J XkiZ^ o %(s)) - Afc(Zf o r?,(s)) A Afc(Zf 1 o %_i(s))dsj Cfc^, 
Zf i(t) = ZlM + j; y,,i (iVTiV^^- 1^ Afc(Zf o ,?,(«)) A Afc(Zf 1 o %_i(s))ds) cf 

+ j;n,3 U^N'^' J XkiZt, o r/,_i(s)) - XkiZ^ o %(5)) A Afc(Zf 1 o %_i(s))(is') Cf , 



(17) 

where the 1^,1, ^ £ {1)2,3}, are independent, unit-rate Poisson processes, and for each £, we define r]£{s) = 
[s/he\ h£. Note that we essentially used the coupling of the simpler examples above (pertaining to Zi and 
Z2) for each of the reaction channels. 

The paths of the coupled processes can easily be computed simultaneously and the distributions of the 
marginal processes are the same as the usual scaled Euler approximate paths (|9]) with similar step-sizes. 
More precisely, the system (fT6l)-([T7]) is the scaled version of, and is hence equivalent to, the system 



Ze{t) =Zi{0) + ^ Yk^i i / Xk{Zi o rii{s)) A Xk{Zi^i o rii^i{s))ds j Ck 

+ ^^fc,2 f / Xk{Ze o r]i{s)) - Xk{Zi o r]i{s)) A Afc(Z^_i orii^i{s))ds ] Cfc, 
Z<?_i(t) =Z^_i(0) + ^Yfe,i ( / Xk{Ziorii{s)) AXk{Zi^iorii^i{s))ds\ Ck 



(18) 



+ X]^fc,3f / Xk{Ze_ior]i^i{s)) - Xk{ZiOi]£{s)) AXk{Ze^iorie^i{s))ds] Ck, 
where now the marginal processes are distributionally equivalent to the approximate processes (18) with 



similar step-sizes, and all notation is as before. The natural algorithm to simulate the representation ( [T8| ) 
(and hence ([T6])-([T7]|) to a time T > is the following. 

Algorithm 2 (Simulation of the representation (fTS])). Fix an integer M > 2. Fix hg > and set /if_i = 
M X he. Set Ze{0) = Zi^i{0) = xq, to = 0, n = 0. Repeat the following steps until t„ > T: 

[i] For j = 0, . . . , M - 1, 

(a) Set 

• Ak,l = XkiZi{tn+j X hi)) A Xk{Zi^i{tn)). 

• Ak,2 = Xk{Zi{tn+j X he)) - Ak,i. 

• ^fc,3 = Xk{Z£_i{tn)) — Afc,l- 

(b) For each k, let 

• Afc^i = Poisson(Afc^i/i£). 
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• ^k,2 = Poisson(Afc_2^£)- 

• Afe,3 = Poisson(Afc,3/i£). 

(c) Set 

• Zt{tn + (j + 1) X hi) = Ze{tn + i X /l^) + Efe(Afc,i + Afc,2)a- 

• Ze^i{tn + (j + 1) X /l^) = Ze^i{tn+j X /if) + J2ki^k,l + Afc,3)Cfc- 

(ii) Set tn+l =tn + hi_i. 
{in) Set n -^ n + 1. 

We make the following observations. First, while Algorithm |2] formally simulates the representation 
( 18 1, the scaled version of the process generated via Algorithm |2] satisfies ([T6])-(17i. Second, we do not 



need to update either Z^_i or Afc(Z£_i) during the workings of the inner loop of j = 0, . . . , M — 1. Third, 
at most one of A2,A3 will be non-zero during each step, with both being zero whenever Xk{Zi(tn)) = 
Xk{Z£_i{tn))- Therefore, at most two Poisson random variables will be required per reaction channel at 
each step and not three. Fourth, the above algorithm, and hence the couplings ( [T8| ) and/or (fT6l)-([T7]), is no 
harder to simulate, from an implementation standpoint, than the usual Euler tau-leaping. Fifth, while two 
paths are being generated, it should be the case that max{A2, ^3} is small for each step. Hence the work in 
computing the Poisson random variables will fall on AfciJM which is the same amount of work as would be 
needed for the generation of a single path of Euler tau-leaping. 

In SectionJT] we will prove the following theorem, which is one of our main analytical results. 



Theorem 1. Suppose {Z^ , Z^_^) satisfy ( 16 1 and ( 17 1 with Z^ (0) = Z^_^ (0). Then, there exist functions 
Ci , C2, that do not depend on h^, such that 

supE|Zf (t) - Zf i(i)|2 < CiiN^T)N-P{N''hi) + C2iN^T){N'fhif. 

t<T 

In particular, for 'J < the values Ci{N'^T) and C2{N'^T) may be bounded above uniformly in N. 

Remark 3. The specific forms of Ci{N'^T) and C2{N'^'T) for Theoremflland Theorem |2] below are given 
in Section It] However, we note here that if 7 > 0, the factors Ci{N^T) and C2{N''T) could be huge, 
leading to upper bounds in Theorem [T] and Theorem [2] of no practical use. So we henceforth assume that 
7 < and thus regard Ci(A^T) and C2{N'^T) as constants independent of A^. We note that the classical 
chemical kinetics scaling, with 7 = 0, satisfies this assumption. However, good performance is observed in 
Section |9] with 7 > 0, suggesting that further analysis may extend the range of validity for this method. 

Note that Theorem [T] together with / Lipschitz gives us the estimate 

|Var(/(Z,^(t)))-Var(/(Zfi(t)))| < E\f{Z^{t))-f{Zt,{t))f 

< CE|Zf(t)-Zfi(t)|2 

< C [Ci{N^T)N^P{N^he) + C2{N^T){N^hif] , (19) 



which we will use to control the variance of Q^ in ( 14 1 



'The cost of generating a Poisson random variable generally increases with the size of the mean 
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N 

and the approximate process Z^ . We will later use this coupling to produce an unbiased MLMC estimator. 



Before further exploring MLMC in the current setting, we present a coupling of the exact process X 



We define X^ and Z^ via 






+ Yl Yk,2 U^N'^' I' Afc(X^(s)) - XkiX'^is)) A A,(Zf o r^e{s))ds^ Cf , 

zf{t) =zf (0) + ^ yfe,i (^xm^^ j^ Afc(x^(s)) a a,(z,^ o r/K^))^^) Ci 



(20) 



-AT 

■,k 



(21) 



+ E ^^.3 fiV^iV^^ _^* Afc(Zf o r?,(s)) - A,.(X^(.)) A A,(Z,^ o rj,{s))ds^ Cf , 

where all notation is as before. Note that the distributions of the marginal processes X^ and Z^ are equal 
to those of ([5]) and (|9]l. The unsealed processes satisfy 



X{t) =X{0) + Y n,i (/ Afc(X(s)) A Afc(Z, o %(s))(is') Cfc 

+ Y. ^k,2 (j Xk{X{s)) - Xk{X{s)) A Xk{Ze o ??,(s))rfs ) a, 

Z^t) =ZKO) + 5^n,i (f Xk{X{s)) A Afc(Z^ o rie{s))ds] Ck 

+ Y -^^'3 ( / '^'^(^^ ° ^^(•^)) ~ Xk{X{s)) A Afc(Z<? o rji>{s))ds ] Ck, 



(22) 



which is equivalent to (20) and (2]_), and whose marginal processes have the same distributions as ([T]) and 



The natural algorithm to simulate (22 1, and hence ([20l)-(|2T]), is the next reaction method 121 [161, where 



the system is viewed as having dimension 2d with state {X^ , -^/^)» ^^id ^^^h of the "next reactions" must 
be calculated over the Poisson processes 1^ 1 , 1^2, Yk,z- See lUj for a thorough explanation of how the next 
reaction method is equivalent to simulating representations of the forms considered here. Below, we will 
denote a uniform[0, 1] random variable by rand(0, 1), and we remind the reader that if [/ ~ rand(0, 1), then 
ln(l/C/) is an exponential random variable with a parameter of one. All random variables generated are 
assumed to be independent of each other and all previous random variables. 



Algorithm 3 (Simulation of the representation ([22])). Initialize. Fix h^ > 0. Set X{Q) = Zi{{)) = xq and 
t = 0. Set Zi = Zi{0). Set Ttau = he. For each k £ {1, . . . , R} andi £ {1, 2, 3}, set Pfc,i = ln(l/rfc,i)' 
where r^ j is rand(0, 1), and T^ ^ = 0. 

(i) For each k, set 

• Ak,i = Xk{X{t)) A Xk{Ze). 

• Ak,2 = Xk{X{t)) - Ak,i. 

• ^fc,3 = Xk{Ze) - Ak^i- 
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(a) For each k £ {1, . . . , R} and i G {1, 2, 3}, set 

A, _ / iPk,i - Tk,i)/Ak,^, if Ak,i + 
''^"l oo, ifAfc, = • 

{iii) Set A = miiifc j{ Atfc j}, and let /i = {fc, i} be the indices where the minimum is achieved. 

{iv) Ift + A>rtau, 

(a) Set Zi = ZfXt). 

(b) For each k £ {I, . . . ,R} andi £ {1,2, 3}, set Tk,i = TkA + Ak,i x (Ttau - t). 

(c) Set t = Ttau- 

(d) Set Ttau = Ttau + he. 

(e) Return to step [i) or quit. 

(v) Else, 

(a) Update. For {k, i} = ji, where /x is from {iii), 

• If i = 1, set X{t + A) = X{t) + Cfc and Zi{t + A) = Z^(t) + Cfe- 

• If i = 2, set X(t + A) = X{t) + Cfc. 

• If i = 3, set Zi,{t + A) = Z^(t) + Cfc 

(b) For each A; G {1, . . . , i?} and z G {1, 2, 3}, set T^^i = Tk^i + A^^j x A. 

(c) Set P^ = P^ + ln(l/r), where r is rand(0, 1), and n is from {iii). 

(d) Set t = t + A. 

(e) Return to step (i) or quit. 

The following theorem, which should be compared with Theorem [T] is proven in Section |7] and is our 
second main analytical result. 



Theorem 2. Suppose {X^ , Zf) satisfy ( |20| and ( [2T] ) with X^ {0) = Z^{0). Then, there exist functions 
Ci , C2, that do not depend on he, such that 



supE|X^(i) - Zf (t)l^ < Ci{N^T)N-f{N^he) + C2{N^T){N^he)\ 

t<T 

Moreover, for 7 < the values Ci{N^T) and C2{N^T) may be bounded above uniformly in N. 

We are now in a position to develop MLMC in the stochastic chemical kinetic setting. Recall our 
assumption that 7 < 0, so Ci and C2 in Threorems [ij and |2| are bounded. We return to the Qe terms in 
([14]). Supposing that the test function / is uniformly Lipschitz in our domain of interest (note that this is 



automatic for any reasonable / in the case when mass is conserved), then for £ > £o^ we know from ( 19l 
that 

Var(4) < C— \Ci{N'^T)N-P{N^he) + C2{N^T){N^he)^] . 
ne 

Note that if A^^^ < he, the leading order of the error is the hf term. As a heuristic argument for this 
behavior, note that if N~p < he and N is large while he is small, then the processes are nearing a scaling 
regime in which deterministic dynamics would be a good approximation for the model X'^ . In this case, 
one should expect that the squared difference between two Euler paths should behave like the usual order 
one error, squared. 
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We may now conclude that the variance of the estimator Q defined in ( [T5| ) satisfies 

L 

Var(Q) = Var(4j + J^ Var(Q,) 

f.=f.o+l 

< — + V C— \Ci{N^iT)N-f{N^hi) + C2{N^'T){N^'he 
where -ffo = Var(/(Z7(r))). For /i > we define 



A{h) = N-P{N^h) + {N^hf. (23) 

Letting no = 0(e"^), and for ^ > £o letting 

we see that 

Var(Q) = 0(e2). 

As the computational complexity of generating a single path of the coupled processes {Zf , ^t-i) is 0(/i^^), 
we see that the total computational complexity of the method with these choices of n^ is of order 

L L 

= e-^ I h-,^ + {L- £o) iZ (^"'^^ + ^^^^^) 
\ f.=e.Q+i 

< e-^ (Vo' + H^?N-PN'< + ln(e-i)^^/i,,Ar27^ , (24) 

where we used that 

1=1.0 + 1 1=1 

A more careful choice of n^ can potentially reduce the ln(e) terms further, see for example lITSl . but in 
the present case, the computational complexity will be dominated by e^^hj^ in most nontrivial examples. 
Further, as will be discussed in Section [8j the n^ can be chosen algorithmic ally, by optimizing for a given 
problem. 

6.1 An unbiased MLMC 

We now build an unbiased MLMC estimator for E/(X^(T)) in a similar manner as before with a single 
important difference: at the finest scale, we couple X^ with Z^ . That is, we use the identity 

E/(x^(r)) = E[/(x^(T))-/(zf(r))]+ X; iE[/(^f)-/(^fi)] + iE/(z,^(r)). 

^=4+1 
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For appropriate choices of no, n^, and ue, we define the estimators for the three terms above via 

HE 



'I'E ■ , 

tip — ^ J ^ ' 

4 = 1 



" i=l 

and note that 



Q = Qe+ Y. Q^ + Q^ (26) 



^=^0 + 1 

is an unbiased estimator for E/(X^(T)). Applying both Theorems [I] and yields 

y^riQE) < Ki{N^T)—A{hL), 

UE 

Var(4) < K2{N^T)-A{h^), for ^ G {4 + 1, • • ■ , L}, 

where Ki(N''T) and K2{N"'T) are independent of hp, and, under our assumption that 7 < 0, can be 
bounded uniformly in N. It follows that the choice 

UE = 0{e-^A{hL)), 

ni = 0{e-\L-io)A{h)), ior i G {io, . . . , L}, (27) 

no = 0(e-2), 

gives us 

L 

Var(Q) = Var(Qs) + ^ Var(Q^) + Var(Qo) 

^=^0+1 

L 

= 0(e2)+ Y. 0{e\L - l,)-^) + 0{e') 

^=^0 + 1 
= 0(6^). 

The computational complexity is now of order 

L L 

nEN+ Yl mhj'+nohJ^' = Ne-^A{hL)+ Y ^'\L - to)A{hi)hJ^ + e'^h^^ 

l=to+l ^=^0+1 

= e-2 I NA{hL) + {L- £0) Y (N-'N'' + h,N^^) + /i,-/ 



< e-2 ('iV^(/i,.) + /i-i + ln{e)^ N~PN^ + lnie~^)j^h,,N^A 



(28) 



where we again made use of the inequality ( 25 1. 
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6.2 Some observations 

A few observations are in order. First, in the above analysis of the unbiased MLMC estimator, the weak error 
of the process Zj^ plays no role. Thus, there is no reason to choose hi = 0(e) for a desired accuracy of 
e > 0. Without having to worry about the bias, we have the opportunity to simply choose hi "small enough" 
for Var(X^(-) — Z^ {■)) to be small, which can be approximated with a few preliminary simulations before 
the full MLMC is carried out (see Section|8]for more implementation details). 

Second, one of the main impediments to the use of tau-leaping methods has been the possibility for 
paths to leave the non-positive orthant. In fact, there have been multiple papers written on the subject of 
how to enforce non-negativity of species numbers with 131 HHJ [121 [37J representing just a sample. We note 



that for the unbiased MLMC estimator ( \26\ it almost does not matter how, or even if, non-negativity is 
enforced. So long as the processes are well defined on all of Z*^, for example by defining the intensity 
functions A^ in some reasonable way, and so long as we can still quantify the relations given in Theorems 
[T] and [2] everything above still holds. The cost, to the user, of poorly defining what happens if Z^ leaves 
the positive orthant will simply be the need for the generation of more paths to reduce the variance of the 
(still unbiased) estimator. Of course, this cost could be quite high as negativity of population numbers can 
lead to instability if they have not defined the intensity functions outside the positive orthant in a reasonable 
manner. However, in Section[8]we discuss how intelligent implementation of the method can greatly reduce 
this cost by ensuring that the approximate paths remain stable with high probability. 

Third, inspecting ( |24| ) and ( [28l ) shows that the unbiased MLMC estimator ([26]l has an additional term 
of 0{NA{hi)e^'^) in its computational complexity bound, as compared with the biased MLMC estimator 
^T5\ . The authors feel that NA{hL) would have to be quite substantial to warrant not using the unbiased 
version. 

Fourth, note that we always have the following: 

Computational complexity of unbiased MLMC = O [e^'^{NA{hL) + hj^ + log term) 

(29) 



< O ie-'^N) 



_ Computational complexity of exact algorithm 
with crude Monte Carlo. 

Thus, under our standing assumption 7 < 0, the unbiased MLMC estimator should be the method of choice 
over using an exact algorithm alone together with crude Monte Carlo, which is by far the most popular 
method today. For example, consider the case when the system satisfies the classical scaling, for which 
p = 1, 7 = and Cfc = 1. In this case, N = N and, as there is little reason to use an approximate method 
with a time step that is smaller than the order of magnitude of the wait time between jumps for an exact 
method, we may assume that hi > 1/A^ = N~f. Therefore, in this specific case, A(/ii) = 0{h\) and the 



computational speedup predicted by ( pS) and/or ( |29| ) is of the order 
Speed-up factor 



e-'^N _ N 

-^Nhl + h„l + log(e)) " Nhl + h-^l + log(e) ' 



Thus we have 



Speed-up factor ^ min (/i^^, Nh^^) 



Therefore, even though the method is unbiased, the computational burden has been shifted from the exact 
process to that of an approximate process with a crude time-step. This behavior is demonstrated in an 
example found in Section[9J though on a system not satisfying the classical scaling. 
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Note also that ([29]l holds even if A^, the approximate cost of computing a single path, is not extremely 
large. For example, even if the cost is only in the hundreds, or maybe thousands, of steps per exact path, the 
above analysis points out that if great accuracy is required (so that e^^ is very large), the unbiased MLMC 
estimator will still decrease the computational complexity substantially. It should be pointed out that in 
these cases of moderate N, we will typically have 7 < and so the analysis will hold. 

The conclusion of this analysis, backed up by the examples in Section |9j is that MLMC methods, with 
processes coupled via the representations ( [T8| ) and (|22]), and the unbiased MLMC in particular, produce 
substantial gains in computational efficiency and could become standard algorithms in the sciences. Further 
attention, however, needs to be given to the case 7 > 0, and this will be a focus for future work. 

7 Delayed proofs of Theorems [1] and |2] 

We begin by focussing on the proof of Theorem|2J which is restated here for completeness. 

Theorem 2. Suppose {X^, Zf) satisfy (|20]) and (|2T) with X^(0) = Zf {Q). Then, there exist functions 
Ci , C2, that do not depend on h^, such that 

supE|X^(t) - Z^{t)\^ < Ci{N^T)N-f{N^ht) + C2{N^T){N^hff . 

t<T 

Moreover, for 7 < the values Ci{N'^T) and C2{N"'T) may be bounded above uniformly in N. 
We start with the following lemma. 



Lemma 3. Suppose {X^" ,Zf) satisfy ^ and ([21) with X^ (0) = Zf{0). Then, there exist positive 



constants ci, C2, independent of N, 7, and T, such that for t > 

E\X^{t) - Z^{t)\ < ci {e"^^''* - 1) {N^he). 
Proof Note that 

E\X^{t)-Z^{t)\ 



E 



Y,Yk,2 (^N^N'" ^ XkiX^'is)) - XkiX'^is)) A Xk{Z^ o %(s))ds) Cf 



< 

k 



-T.^k,3 [N^'N^' ^ XkiZf o %(5)) - XkiX^'is)) A XkiZf o Ms))dsj Cf 
Y, iCf I IEn,2 [n^N'^' I^ Afe(X^(s)) - Xk{X^{s)) A XkiZf o r^,{s))ds 



+ En,3 (xm^^ J Xk[Zf o %(s)) - Afc(X^(s)) A Afc(Zf o r],[s))ds 
Y, ICf |iV"iV^'= rE|Afc(X^(s)) - A,(Zf o r^,is))\ds 



k -^0 

< N^C I E\X^{s) - Zf o %(s)|ds, 
Jo 



where C > is some constant and we used that the Xk are assumed to be Lipschitz. Adding and subtracting 
the obvious terms yields 

E\X^{t) - Zf{t)\ < N^C I E\Zf{s) - Zf o T]i{s)\ds + iV^C [ E\X^{s) - Zf (s)|ds. (30) 

Jo Jo 
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The integrand of the first term on the right hand side of ( 30 1 satisfies 



E\Z^is) - Zf o r?,(s)| < Y. ICf lA^^iV^^IE T Afc(Zf (r?,(r))<ir < CiV^/i,, (31) 

where (7 > is a constant, and we recall that A is 0(1) in our region of interest. Collecting the above yields 

E|X^(t) - Zf{t)\ < CiN'^^thi + CsiV^ / E\X^{s) - Zf{s)\ds, 

Jo 

for some positive constants Ci , C2 that are independent of N, 7, and T. The result now follows from 
Gronwall's inequality. D 

We note that Lemma |3] is a worst case scenario due to the appearance of the term N^ in the exponent. 

N~> 

However, considering the network Si — )■ 25"! (exponential growth), shows this to be a sharp estimate. A 
future research direction will be classifying those networks for which this upper bound can be decreased 
substantially. 

We are now in position to prove Theorem |2] 

Proof, (of Theorem [2]) We have 

X^{t) - Z^{t) = M^{t) + / F^{X^{s)) - F^iZf o 7]e{s))ds, 

Jo 

where 

M^(t) ^ Y. ^^.2 U^N^' I Afc(X^(s)) - XkiX'^is)) A Afe(Zf o ^,{s))ds 



WN'" [ XkiX^'is)) - Xk{X^{s)) A Afc(Zf o r]e{s))ds 
Jo 



^N 
^k 



Y [n,3 (n^N'" I Xk{Z^ o neis)) - XkiX^'is)) A Xk{Z^ o rje{s))ds 



k 



+ Nm"^ [ Xk{Zi'orje{s))-Xk{X''{s))AXk{Z^or,e{s))ds 
Jo 



'^k ) 



is a martingale, and 

F^(x) = j;iV^iV^^Afc(x)Cf. 

k 

Note that based upon our assumptions, we have that 

\F^{x)-F^{y)\<CN^\x-y\, (32) 

where C > is a constant that does not depend upon N or 7. The quadratic covariation matrix of M^ is 

k 

where 



J^,2{t) = lfc,2 (^Nm^'' ^ Afe(X^(s)) - Afc(X^(.)) A XkiZf o ^,{s))ds 
Jkd^) = n,3 (n^N^' f Xk{Z^ o r^eis)) - Afc(X^(s)) A A,(Zf o %(s))d. 
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Thus, 



and, in particular, 

E[M%,{t) = yZiClifN^'^'^E f |Afc(X^(s)) - Afc(Zf o r?,(s))| ds. 
^ Jo 



We note that 



7Nf^.\\2 



Nr^.\l2 



X'\t) - Z'^{t)y < 2\M'^t)\^ + 2 



Jo 



(33) 



(34) 



and we may handle the two terms on the right hand side of the above equation separately. 
First, by ( |33| ) and the Burkholder-Davis-Gundy inequality, 

IE[|M^(t)|2] < Y, E(C^)'^"^''=IE r |Afc(X^(5)) - Xk{Zf o r?,(5))| ds 



i k 



= yZ ICkl'Nm^^E f |Afc(X^(s)) - \u{Z^ o ry,(s))| ds 
k J^ 

< ICN^iN-fE I \X^{s) - Z^ o %(s)| ds, 
Jo 



(35) 



where C is a constant independent of N, t, and 7. After adding and subtracting Zf [s), using ( [3T] ), and 
applying Lemma |3j we conclude that for t < T 

E[|M^(t)|2] < {ciN^<Te''^^"^)N-P{N^hi), (36) 

for some constants ci , C2 that do not depend upon T, 7, or A^, and which will change during the course of 
the proof. 



Turning to the second term on the right hand side of ( [34] ), making use of ( |32| ) we have for some C > 
independent of T, 7, and N, 

E( l\F''{X''{s))-F''{Z^or^,{s))\ds 



< CN^m( I \zf o ng{s) - zf{s)\ds^ + CN^m( f \x^{s) - z^{s)\ds 



(37) 



The expected value in the first term on the right hand side of ( [37) can be bounded via 

El / \Z^ or]e{s)-Z^{s)\dsj <TE f \Z^ o r]e{s) - Z^{s)fds 



n ft-+h 

tY, I E\Zfo,^,{s)-Z^{s)\''ds. 



(38) 



We have that 



E|Zfor?,(^)-Zf(s)p<^|Ci: 



Af|2 



iYT^TCfcE j Afc(Zf or]i{r))dr 



+ Af2TAr2cfc]E/ / Xi,{Z^ o 7]g{r))dr 
\Jveis) 



(39) 
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for some constant C > 0. Combining (38 1 and (39 1 shows 



i-T \ 2 



e( f \Zf or]e{s)-Zf{s)\ds] < T^{CN^'N~Phi + CN^^hj). (40) 



Combining ( [40] ) with pT) then yields 

ft 



E 



( I \F^{X^{s)) - F^{Z^ o rieis))\ds] < ciN^''T^N~P{N^hi) + C2T^ N^^ {N^ heY' 

Jo 



(41) 



for some constants ci, C2, C3 that do not depend upon T, N, or 7. Equations ( |34| ), ( [36| ), and ( |4T] ) yield 

E[|X^(t) - Zf (t)|2] < {ciN"'Te^^^''^)N-P{N"'he) + ciiV272-2^-p(^7/i^) + c2T2iV2T(An'/j^)2 

+ C3tN^^ I mX^{s)-Z^{s)\^ds. 



/o 
The result now follows from Gronwall's inequality. D 

We turn our focus to the proof of Theorem[T] which is restated here for completeness. 

Theorem 1. Suppose {Z^ , Zf_^) satisfy ([16]) and (Vh with Z^ (0) = Z|^^(0). Then, there exist functions 
Ci , C2, that do not depend on h^, such that 

supE|Zf (i) - Zf i(i)|2 < Ci{N^'T)N-P{N''he) + C2{N^T){N^hif. 

t<T 

In particular, for 'J < the values Ci{N^T) and C2{N'^T) may be bounded above uniformly in N. 

Proof, (of Theorem [T]) A direct proof can be written along the lines of that for Theorem |2] A separate, 
cruder, proof would simply add and subtract X^ {t) to \Z^ {t) — Z^_^{t)\^ and use Theorem combined 
with the triangle inequality. D 

8 Implementation issues 

The analysis in Sections [6] and [7] specified an order of magnitude for the number of paths, n^, to be used at 
each level so as to attain the desired accuracy. This was needed to prove that the computational complexity 
can be greatly reduced with an appropriate choice of the n^. However, the analysis does not tell us what the 
nn should be with precision, nor does it tell us that these are the optimal n^, which, of course, will depend 
on the function /, and the model itself. 

Letting Vi denote the variance of Q^ for a given n^, and CPUi be the CPU time needed to generate n^ 
paths, we know that 

for some Ki as both CPUi and l/Ve scale linearly with ni. Further, for a given tolerance, e, we need 

Var(Q) = ^y, = (e/1.96)2, (42) 
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for, say, a 95% confidence interval (where the quantity 1.96 will be changed depending upon the size of the 
confidence interval desired). We may approximate each Kp with a number of preliminary simulations (not 
used in the full implementation), and then minimize 



E 






subject to the constraint ([42]). This will give us target variances, V^, for each level, and an estimate on the 
time needed until the computation is completed. We may then simulate each level until enough paths have 
been generated for the variance of the estimator at that level to be below the target Vi. Note that this is 
similar to the strategy proposed in lITSl . 

We make the important observation that with such an optimizing pre-computation, we can choose both 
L and ^o> that is the finest and crudest levels, before attempting the full calculation. This reduces the 
probability of the approximate processes becoming negative, or non-physical in other ways, during the 
course of a simulation. This, in turn, helps keep the approximate processes stable, which leads to increased 
efficiency. Further, if we find during such a pre-computation that no choice of levels and number of paths 
will be significantly faster than using an exact algorithm combined with crude Monte Carlo, then we should 
simply revert to solely using an exact algorithm. We may conclude, therefore, that the developed method 
will never, for any example, be appreciably slower than using an exact algorithm with crude Monte Carlo. 
As will be demonstrated in the next section, however, the method will often times, even in cases not yet 
predicted by the analysis, be significantly faster. 

Finally, we note that in each of the examples in Section [9} and each method tested, we use Matlab's 
built in Poisson random number generator. Further, we produce the necessary approximate paths in batches 
ranging from the 100s to 10s of thousands so as to reduce the number of separate calls to the Poisson random 
number generator. 

9 Examples 

We present three examples to demonstrate the performance of the proposed method. 

Example. We begin by considering a model of gene transcription and translation also used in lH: 

G^G + M, M^^^M + P, P + P^-^^D, mH%, p\%. 

Here, a single gene is being transcribed into mRNA, which is then being translated into proteins, and finally 
the proteins produce stable dimers. The final two reactions represent degradation of mRNA and proteins, 
respectively. We suppose the system starts with one gene and no other molecules, so X(0) = (1,0,0) 
where Xi, X2, X3 give the molecular counts of the mRNA, proteins, and dimers, respectively. Finally, we 
suppose that we want to estimate the expected number of dimers at time T = 1 to an accuracy of ±1 with 
95% confidence. Thus, we want the variance of our estimator to be smaller than (1/1.96)^ ^ .2603. We 
will also estimate the second moment of the number of dimers, which could be used in conjunction with the 
mean to estimate the variance. For comparison purposes, we will use each method discussed in this paper 
to approximate the mean, and will use an exact method combined with crude Monte Carlo and the unbiased 
MLMC method to approximate the second moment. 

While e = 1 for the unsealed version of this problem, the simulation of just a few paths of the system 
shows that there will be approximately 23 mRNA molecules, 3,000 proteins, and 3,500 dimers at time 
T = 1. Therefore, for the scaled system, we are asking for an accuracy of ?= 1/3500 ss 0.0002857. Also, 
a few paths ( 100 is sufficient) shows that the order of magnitude of the variance of the normalized number of 
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Mean 


Variance 


^xi{i) 


# paths 


CPU Time 


# updates 


3714.2 ± 1.0 


« 1,232,418 


1.5035 xlO'^±8 xlO^ 


4,740,000 


1.49 xlO^CPUS 


8.27 xlO^u 



Table 1 : Performance of an exact algorithm with crude Monte Carlo. The mean number of dimers at time 1 
is reported with 95% a confidence interval. The approximated variance of the number of dimers is provided 
for completeness. An estimate of the second moment is also provided with a 95% confidence interval. 



Step-size 


Mean 


# paths 


CPU Time 


# updates 


h = 3-' 


3,712.3 ± 1.0 


4,750,000 


13,374.6 S 


6.2 X 10^" 


h = 3-'^ 


3,707.5 ± 1.0 


4,750,000 


6,207.9 S 


2.1 X W^^ 


h = 3-^ 


3,693.4 ± 1.0 


4,700,000 


2,803.9 S 


6.9 X igy 


/i = 3-4 


3,655.2 ± 1.0 


4,650,000 


1,219.0 S 


2.6 X 10^ 



Table 2: Performance of Euler tau-leaping with crude Monte Carlo for the computation of the first moment 
of Xs, the number of dimers. The bias of the method is apparent. 

dimers is approximately 0. 1 1. Thus, the approximate number of exact sample paths we will need to generate 
can be found by solving 

1. 



n 



- Var (normalized # dimers) = (e/1.96)^ 



n 



5.18 X 10 



6 



Therefore, we will need approximately five million independent sample paths generated via an exact algo- 
rithm. 

We also note that with the rough orders of magnitude computed above for the different molecular counts 
at time T = 1, we have N ss 3,500, ai ^ .38, and 02 = as ~ 1. Therefore, we have that N'^ ^ 
23, 000/3, 000 = 7.6 =^ 7 w 0.2485 for this problem (where we chose the "stiffest" reaction for this 
calculation, which is that of M — )• M + P). However, we note that the parameter 7 changes throughout the 
simulation and is quite a bit higher near t ^ 0. 

Implementing the modified next reaction method, which produces exact sample paths [T|, on our ma- 
chine]^ (using Matlab), each path takes approximately 0.03 CPU seconds to generate. Therefore, the ap- 
proximate amount of time to solve this particular problem will be 155,000 CPU S, which is about forty 
three hours. The outcome of such a simulation is detailed in Table [T] where "# updates" refers to the total 
number, over all paths, of steps, and is used as a crude quantification for the computational complexity of 
the different methods under consideration. 

Next, we solved the problem using Euler tau-leaping with various step-sizes, combined with a crude 
Monte Carlo estimator. The results of those simulations are detailed in Table (2] Note that the bias of 
the approximate algorithm has become apparentjj We then implemented the biased version of MLMC 
with various step-sizes. The results of those simulations are detailed in Table [3j where the approximations 
and CPU times should be compared with those of Euler tau-leaping. The CPU times stated include the 
time needed to solve the embedded optimization problem discussed in Section [8] Note that the gain in 
computational complexity, as quantified by the # updates, over straight tau-leaping with a finest level of 
hi = 3^^ is 56 fold, with straight tau-leaping taking 17.1 times longer. Also note that the bias of the 
approximation method is still apparent. 

Finally, we implemented the unbiased version of MLMC with various step-sizes. The results of those 
simulations are detailed in Table |4] As in the biased MLMC case, the CPU times stated include the time 

^We used an Apple machine with 8GB Ram and an i7 chip. 
^This data also appears in |[6J. 
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Step-size parameters 


Mean 


CPU Time 


# updates 


M = 3, L = 7 


3,712.6 ±1.0 


781.8 S 


1.1 xlO^ 


M = 3, L = 6 


3,708.5 ± 1.0 


623.9 S 


7.9 xl0« 


M = 3, L = 5 


3,694.5 ± 1.0 


546.9 S 


6.6 xlO^ 



Table 3: Performance of biased MLMC with M = 3, £o = 2, and L ranging from 7 to 5. The reported times 
include the time needed for the pre-computations used to choose the number of paths per level as discussed 
in Section m 



Step-size parameters 


Mean 


CPU Time 


Var. of estimator 


# updates 


M = 3, L = 6 


3,713.9 ±1.0 


1,063.3 S 


0.2535 


1.1 xl09 


M = 3, L = 5 


3,714.7 ±1.0 


1,114.9 S 


0.2565 


9.4 xlO« 


M = 3, L = 4 


3,714.2 ±1.0 


1,656.6 S 


0.2595 


1.0 xlO^ 


M = 4, L = 4 


3714.2 ±1.0 


1,334.8 S 


0.2580 


1.1 xlO^ 


M = 4, L = 5 


3,713.8 ±1.0 


1,014.9 S 


0.2561 


1.1 xl09 



Table 4: Performance of unbiased MLMC with £q = 2, and M and L detailed above. The reported times 
include the time needed for the pre-computations used to choose the number of paths per level as discussed 
in Section m 

needed to solve the embedded optimization problem discussed in Section[8] We see that the unbiased MLMC 
estimator behaves as the analysis predicts: there is no bias for any choice of M or L, and the required CPU 
time are analogous to Euler's method with a course time-step. Further, the exact algorithm with crude 
Monte Carlo, by far the most commonly used method in the literature, demanded approximately 80 times 
more updates and 140 times more CPU time than our unbiased MLMC estimator, with the precise speedups 
depending upon the choice of Af and L. 

We feel it is instructive to give more details to at least one choice of M and L for the unbiased MLMC 
estimator. For the case with M = 3, L = 5, and Iq = 2, we provide in Table |5] the relevant data for the 
different levels. As already stated in Table|4} the total time with the optimization problem was 1,1 14.9 CPU 
S, more than the total CPU time reported in Table [5j which does not include the time needed to solve the 
optimization problem. Note that most of the CPU time was taken up at the coarsest level, as is common 
with MLMC methods and predicted by the analysis. Also, while the exact algorithm with crude Monte 
Carlo demanded the generation of almost five million exact sample paths, we needed only 3,900 such paths 
at our finest level. This difference is the main reason for the dramatic reduction in CPU time. Of course, 
we needed more than eight million paths at the coarsest level, but these paths are very cheap to generate. 
Finally, we note that the optimization problem divided up the total desired variance into non-uniform sizes 
with the more computationally intensive levels generally being allowed to have a higher variance. 

In Table |6] we provide the data pertaining to the estimate of the second moment using the unbiased 
version of MLMC with M = 3 and L = 5 that appears in the second row of Table |4] The 95% confidence 
interval for the second moment is 1.5031 xlO^ ± 9 xlO'^, which should be compared with the confidence 
interval generated using an exact method. 

Example. We turn now to a simple example that allows us to study how the behavior of the developed 
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Level 


# paths 


Mean 


Var. estimator 


CPU Time 


# updates 


{X,Z,^.) 


3,900 


20.1 


0.0658 


279.6 S 


6.8 xlO'' 


(^3-5 > ^3-4) 


30,000 


39.2 


0.0217 


49.0 s 


8.8 xlO'' 


(^3-4,^3-3) 


150,000 


117.6 


0.0179 


71.7 S 


1.5 xIQS 


(^3-3,^3-2) 


510,000 


350.4 


0.0319 


112.3 S 


1.7 xlO^ 


Euler, h = S'^ 


8,630,000 


3,187.4 


0.1192 


518.4 S 


4.7 xIQS 


Totals 


N.A. 


3,714.7 


0.2565 


1031.0 S 


9.5 xIQS 



Table 5: Details of the different levels for the implementation of the unbiased MLMC method with M = 3, 
L = 5, and £q = 2. By {X, Z3-5 ) we mean the level in which the exact process is coupled to the approximate 
process with h = 3~^, and by {Z^-i, Z^-e+i) we mean the level with Z^-t coupled to Z^-e+i. 



Level 


Estimate 


Var. of estimator 


95% Confidence inteval 


{X,Z,^,) 


157,000 


5.8 xlO^ 


N.A. 


(^3-5,^3-4) 


303,000 


2.0 X 106 


N.A. 


(^3-4,^3-3) 


894,000 


2.2 xlO^ 


N.A. 


(^3-3,^3-2) 


2.4888 xlO^ 


4.2 xlO^ 


N.A. 


Euler, h = 3~^ 


1.1188 XlO'' 


5.7 XlO** 


N.A. 


Totals 


1.5031 XlO'' 


2.0 XlO'' 


1.5031 XlO'' ±9 xlO^ 



Table 6: Details of the different levels for the implementation of the unbiased MLMC method with M = 3, 
L = 5, and io = 2 for the approximation of the second moment. 

methods depends upon the parameter 7. Consider the family of models indexed by 9, 



A^B, 



with, 



Xa{0) = Xb{0) = [1,000 ^-^J, 



where [x\ is the greatest integer less than or equal to x. The stochastic equation governing Xa, giving the 
number of A molecules, is 

XA{t) = Xa{o) + Yi (j e{2,mw-^ - XA{s))d.^ " ^' (/ ^^^(^)^*) ' 

with the Euler approximation given by 

ZA{t) = Za(0) + yi(j 0{2, 0000-1 _ ^^ ^ r?(s))ds j - ^2 ( / ^^A o v{s)ds] . 

Letting N = Xa{0), we see that 6 = N^, implying 

7 = ln(6')/ln(iV). 

Note, in particular, that 7 > <;=^ 9 > I, with 7 being a strictly increasing function of 6. Therefore, 
we may test the dependence of the behavior of the MLMC method on this model by varying the single 
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e 


Method 


Estimate 01X^(1) 


CPU time 


# Paths 


Speedup 


.1 


Crude MC 


9,999.97 ± 0.20 


1,476.9 S 


164,800 


N.A. 


.1 


MLMC: M = 3, L = 4, 4 = 


10,000.07 ±0.19 


6.5 S 


N.A. 


227.2 


.5 


Crude MC 


1,999.90 ±0.16 


1,110.9 S 


124,100 


N.A. 


.5 


MLMC: M = 3, L = 4, 4 = 1 


2,000.10 ±0.16 


15 S 


N.A. 


74.1 


1 


Crude MC 


999.96 ±0.11 


1,464.4 S 


163,800 


N.A. 


1 


MLMC: i\/ = 3, L = 5, 4 = 2 


999.99 ±0.11 


28 S 


N.A. 


52.3 


2 


Crude MC 


500.01 ±0.11 


739.9 S 


82,700 


N.A. 


2 


MLMC: M = 6, L = 6, 4 = 3 


499.96 ±0.11 


21 


N.A. 


35.2 


10 


Crude MC 


99.983 ± 0.044 


900.2 S 


100,600 


N.A. 


10 


MLMC: M = 3, L = 6, Iq = 5 


99.965 ± 0.044 


65 S 


N.A. 


13.8 


25 


Crude MC 


40.012 ± 0.028 


898.0 S 


100,500 


N.A. 


25 


MLMC: AI = 3,L = 6,io = 6 


39.996 ± 0.028 


98 S 


N.A. 


9.2 


50 


Crude MC 


20.008 ± 0.0139 


1,789.0 S 


200,200 


N.A. 


50 


MLMC: M = 3,L = 7,lo = 7 


20.005 ± 0.0138 


360 S 


N.A. 


5.0 


100 


Crude MC 


10.002 ± 0.0139 


892.6 S 


100,100 


N.A. 


100 


MLMC: M = 3, L = 7, 4 = 7 


9.988 ± 0.0138 


250 S 


N.A. 


3.6 


200 


Crude MC 


4.9996 ± 0.0088 


1,120.3 S 


125,400 


N.A. 


200 


MLMC: M = 3, L = 7, 4 = 7 


4.9958 ± 0.0087 


486 S 


N.A. 


2.3 


500 


Crude MC 


2.0029 ± 0.0044 


1,781.6 S 


199,400 


N.A. 


500 


MLMC: M = 3, L = 7, 4 = 7 


1.9953 ±0.0044 


1,625.9 S 


N.A. 


1.1 


1,000 


Crude MC 


1.0038 ±0.0043 


897.2 


100,200 


N.A. 


1,000 


MLMC: M = 3,L = 7,iQ = 7 


1.0015 ±0.0044 


1,412.3 S 


N.A. 


0.64 



Table 7: Approximation of X^(l) with 95% confidence intervals. Note that the speedup factor decreases as 
9 increases, with MLMC becoming less efficient than an exact method when 9 = 1, 000. 

parameter 6. We will let 9 range from 0.1 to 1, 000 and for each 9 use both an exact method with crude 
Monte Carlo and the unbiased MLMC method developed here to estimate the mean number of A molecules 
at time 1. We choose different values for our tolerance parameter, e, for different values of 9. The results of 
these computations can be found in Table [7] 

The analysis of this paper predicts that as 9 increases, MLMC should progressively lose its compu- 
tational advantage over an exact algorithm, and this is borne out in the data provided in Table |7] Note, 
however, that the unbiased version of MLMC remains significantly more efficient than an exact algorithm 
until 9 = 1, 000, in which case Xa{0) = 1. Having 9 = 1,000 is arguably the worst case scenario for an 
approximate algorithm such as tau-leaping, and the coupling method performs slightly worse than using an 
exact method with crude Monte Carlo. As discussed in Section [8j we would normally in this case simply 
use the exact method alone. However, we report the MLMC data for the sake of comparison. Further, it is 
extremely encouraging that there were still large gains in efficiency even when 9 G {25, 50, 100}, some- 
thing not predicted by the current analysis. Interestingly, the speedup factor of MLMC over an exact method 
appears, for this example, to be a function of 9. Specifically, as demonstrated by the log-log plot in Figure 
[T] we observed the relation 

Speedup factor ^ 54.6 6l"°-^2. 
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Figure 1 : Log-log plot for the speedup factor of unbiased MLMC over an exact method. The best fit line, 
not shown, is 4 — 0.62x. 



Example. We finish with a model of viral kinetics first developed in 113611 by Yin et al., and subsequently 
studied by Haseltine and Rawlings in f23l, Ball et al., in fE], and E et al., in f\A]. One reason the interest 
in this model has been so high from the biological, engineering, and mathematical communities is that it 
exemplifies a feature of many stochastic models arising in the biosciences: a separation of time scales. We 
will use this model to demonstrate that one of the main ideas of this paper, the coupling, is not restricted to 
the use of approximate methods defined using time-discretizations. 

The model includes four time-varying "species": the viral genome (G), the viral structural protein (S), 
the viral template (T), and the secreted virus itself (V). We denote these as species 1, 2, 3, and 4, respec- 
tively, and let Xi{t) denote the number of molecules of species i in the system at time t. The model involves 
six reactions. 



1 : 


T^T + G, 


Kl = 1, 


2 : 


G^T, 


K2 = 0.025, 


3 : 


T^T + S, 


K3 = 1000, 


4 : 


r^0, 


K4 = 0.25, 


5 : 


5^0, 


K5 = 2, 


6 : 


G + S^V, 


KQ = 7.5 X 10 
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where the units of time are in days. The stochastic equations for this model are 

Xi{t) = Xi(0) + yi(f X3{s)ds] - Y2 ("0.025 / Xi{s)ds 

- Ye (7.5 X 10-6 / Xi{s)X2is)ds\ 

X2{t) = X2(0) + ^3 flOOO / X3{s)ds] - ^5 (2 /" X2{s)ds] 

- Yq (7.5 X 10^6 / Xi{s)X2is)ds\ 

X?Xt) = ^3(0) + Y2 fo.025 / Xi{s)ds\ - Yi (0.25 / X^{s)ds\ 
Xi{t) = ^4(0) + ^6 (7.5 X 10^6 / Xi{s)X2{s)ds\ . 



Following fl4], we assume an initial condition of X(0) = (0,0, 10,0)*. We see that whenever the 
number of viral templates is positive, that is whenever X3 > 0, the rates of the third and fifth reactions will 
be substantially larger than the others. At the times when X3 > and X2 = 0(1), we have that 7^1, 
with 7 remaining large until X2 = 0(1000). However, even when X2 = 0(1000), the natural time-scale 
of the problem is 0(1/1000), whereas the time-scale in which we would like to answer questions is 0(1). 

Instead of implementing our MLMC method directly, we take an alternative approach that makes use of 
the idea of the coupling, though not the multi-level aspect of the paper. That is, we will build an approximate 
process Z that will be used as a control variate for X. Towards that end, note that when the number of 
templates is positive, reactions 3 and 5 are much faster than the others. Ignoring the other reactions, we see 
that when X3 > 0, the "system" governing the dynamical behavior of S is 

1000X3(4) 

2 

which has an equilibrium distribution that is Poisson with a parameter of 500X3(t), see [4]. Believing that 
we may use this mean value of X2{s) in the integrated intensity of reaction 6, that is 

/ Xi{s)X2{s)ds « / Xi(s)(500X3(s))ds, (43) 

Jo Jo 

we hope a good approximate model for G, T, and V, which we denote by Z = (Zi, Z3, Z4) so as to remain 
consistent with the enumeration of X, is 

Zi{t) = Xi{0)+Yi U Z3{s)dsj - Y2 (0.025 / Zi{s)ds 



(44) 



-Yq (3.75 X 10"^ / Zi{s)Z3{s)ds] 
Z3{t) = ^3(0) +y2 (0.025 / Zi{s)ds\ - Y4 (0.25 / Z3{s)ds 

Zi{t) = ^4(0) +y6 (3.75 X 10"^ / Zi{s)Z3{s)ds\ . 

Note that while Z is an approximate model of X, it is still a valid continuous time Markov chain satisfying 
the natural non-negativity constraints. In particular, there is no time-discretization parameter in Z, which 
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is where many technical problems related to tau-Ieaping (stability concerns, negativity of molecular counts, 
etc.) arise. 



We will now couple the two processes in a manner similar to ( p2| ) and build our estimator. Let (k denote 
the reaction vector for the kth reaction. Let X^iX) = 7.5 x 10~^XiX2, and Ae^Z) = 3.75 x lO'^ZiZ^. 
For arbitrary /, we can estimate ¥,f{X{T)) via 

Ef{X{T)) = E{f{X{T)) - f{Z{T))) + Ef{Z{T)), (45) 



where E/(Z(r)) is estimated by crude Monte Carlo using the representation ( |44| ), which is relatively cheap 
to simulate, and we estimate K{f{X{T)) — f{Z{T))) using independent realizations from the coupled 
processes (X, Z) below 

X{t) = X(0) + Yi,i (j min{X3(s), Z:,{s)}d^ Ci + >1,2 (j ^3(s) - Tmn{X^{s), Z^{s)}d^ Ci 

+ ^2,1 (0.025 j min{Xi(s),Zi(s)}dsj (2+^^2,2 (o.025 j Xi(s) - min{Xi(s), Zi(s)}ds j C2 

+ yJ 1000 J X3{s)ds]c3 

+ n,! (o-25/' min{X3(s), Z3{s)}{s)ds^ (4 + >i,2 (^0.25 / X^is) - mm{X3{s), Z3{s)}{s)ds^ ^4 



^5(2 J X2{s)ds](:5 



real / mm{X(,{X{s)),A(,{Z{s))}ds]c6-Ye,2( f X6{X{s))-mm{X(,{X{s)),Ae{Z{s))}ds]i:e 



/ V^o 

t \ / ft 



Z{t) = ri,i (j xmn{X^{s),Z^{s)}d^ Ci +n.3 (j ^3(5) -min{X3(s),Z3(s)}ds 1 Ci 

+ 1^24 (0.025 / min{Xi(s),Zi(s)}dsj (2+^^2,3 ( 0.025 / Zi(s) - niin{Xi(s), Zi(s)}ds j C2 

+ ^44 (o-25 / min{X3(s),Z3(s)}(s)ds') (4 + >"4.3 (o-25 / ^3(5) - min{X3(s), Z3(s)}(s)ds') (4 

+ 1^64 fy min{A6(X(.s)),A6(Z(s))}ds') Ce - i"6,3 f^ A6(Z(s)) - niin{A6(X(s)), A6(Z(s))}ds') (e, 

(46) 

where the Yfc^i's are independent, unit-rate Poisson processes. Note that we have coupled the process 
through the reaction channels 1, 2, 4, and 6, in the usual way, though not through 3 or 5, which are not 
incorporated in the model for Z. Simulation of the coupled processes, which is itself just a continuous time 
Markov chain in ^>q, may proceed by any exact algorithm. Here we used the next reaction method ll2l[T6l. 
Supposing we want to estimate EX4(20), giving the mean number of virus molecules at time 20, we 
calculate this value using both a naive application of the next reaction method with crude Monte Carlo, and 



the control variate approach of ( [45] ) with the coupling ( [46| ). The details of the two computations are found 
in Table[8] We see that the crude Monte Carlo implementation required 60 times more updates and 22 times 
more CPU seconds than the control variate/coupling approach, again demonstrating the usefulness of the 
core ideas of this paper. 

10 Conclusions 

This work focused on the Monte Carlo approach to estimating expected values of continuous time Markov 
chains. In this context there is a trade off between the accuracy and the cost of each Monte Carlo sample. 
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Method 


Approximation 


# paths 


CPU Time 


# updates 


Crude Monte Carlo 


13.85 ± 0.07 


75,000 


24,800 CPU S 


1.45 X 10^*^ 


Coupling 


13.91 ± 0.07 


N.A. 


1,118.5 CPUS 


2.41 X 10« 



Table 8: Details of the approximated expected virus level at time 20 using crude Monte Carlo with an exact 



algorithm and a control variate approach using the coupling (46 1. 



Exact samples are available, but these are typically very expensive, especially in our target application of 
biochemical kinetics. Approximate samples can be computed by tau-leaping, with the bias governed by a 
discretization parameter, h. A realistic analysis of the cost of tau-leaping must acknowledge the importance 
of system scaling. In particular, for a fixed system, in the limit /i — )• tau-leaping becomes infinitely 
more expensive than exact sampling, since it needlessly refines the waiting times between reactions. In this 
work, we studied tau-leaping in a general setting that incorporates system scaling without taking asymptotic 
limits. Motivated by the work of Giles 1 18] on diffusion processes, we then introduced a new multi-level 
version of the algorithm that combines coordinated pairs of tau-leaping paths at different h resolutions. 
The two main conceptual advances in this work were (a) pointing out the practical benefits of a coupling 
process that had previously been introduced solely as a theoretical tool, and (b) exploiting the availability 
of an exact sampling algorithm to give an unbiased estimator. Our theoretical analysis of the computational 
complexity showed that the new algorithm dramatically outperforms the existing state of the art in a wide 
range of scaling regimes, including the classical scaling arising in chemical kinetics. The new algorithm is 
straightforward to summarize and implement, and numerical results confirmed that the predicted benefits 
can be seen in practice. 

There are several avenues for future work in this area, including, 

• using Quasi-Monte Carlo sampling to improve practical performance, 

• customizing the method in the context of multi-scale or hybrid models, where it is possible to exploit 
special structure in the form of fast/slow reactions or species, or where the discrete space Markov 
chain is coupled to diffusion or ODE models, 

• extending the theoretical analysis to the 7 > regime, in order to explain why we continued to 
observe excellent results in practice, 

• using the coupling idea without discretization to obtain a control variate method that exploits specific 
problem structure, as illustrated in the third example of section |9] 
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